Comparison on the Effectiveness of Different Statistical Similarity Measures

نویسنده

  • Safaa I. Hajeer
چکیده

Document retrieval is the process of matching of some sated user query against a set of free-text records (documents), its one major technique for organizing and managing information. This project was concerned with studying which of the different statistical measures in IR have the most effectiveness on document retrieval using a unified set of documents. The results show that the Cosine Similarity Measure is the best of other seven measures (Inner Product, Dice Coefficient, Jaccard Coefficient, Inclusion Similarity Coefficient, Overlap Coefficient Measure, Euclidean distance Measure and Manhattan Distance Measure (City Block Distance) for both languages, with precision on Arabic collection 38% and recall 53.2%. On English collection, the precision is 25% and recall 65%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Comparison of Different Distance Measures on Hierarchical Document Clustering in 2-Pass Retrieval

Hierarchic document clustering has been applied to search results (query-specific clustering ) on the grounds of its potential improved effectiveness compared both to that of static clustering and of conventional inverted file search (IFS). In this paper we review and compare the effects of seven different measures of similarity among documents in hierarchic query specific clustering. We have c...

متن کامل

Intuitionistic Fuzzy Information Measures with Application in Rating of Township Development

Predominantly in the faltering atmosphere, the precise value of some factors is difficult to measure. Though, it can be easily approximated by intuitionistic fuzzy linguistic term in the real-life world problem. To deal with such situations, in this paper two information measures based on trigonometric function for intuitionistic fuzzy sets, which are a generalized version of the fuzzy informat...

متن کامل

A new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators

Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...

متن کامل

Comparison of the effectiveness of integrative transdiagnostic treatment and dialectical behavior therapy on hope and pain perception among cancer patients in Isfahan

Background: Integrative transdiagnostic and dialectical behavioral therapy seem to affect the hope and pain perception of cancer patients. The aim of this study was to compare the effectiveness of integrated transdiagnostic therapy and dialectical behavioral therapy on hope and pain perception of cancer patients in Isfahan. Materials and methods: The method of this study was quasi-experimental...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012